LINQ
to Objects has standard query operators for working on sets of elements
within collections. These operators allow two different collections
(containing the same types of elements) to be merged into a single
collection using various methods.
The
set operators all implement a deferred execution pattern, simply
meaning that they do not evaluate the next element until they are
iterated over one element at a time. Each operator is detailed in this
section, including the method signatures for each operator.
Concat Operator
Concat
combines the contents of two collections. It operates by looping over
the first collection yield returning each element, then looping over
the second collection yield returning each element. If returning the
duplicate elements is not the desired behavior, consider using the Union operator instead. An ArgumentNullException is thrown if either collection is null when this operator is called.
Concat has a single overload with the following method signature:
// Combines the contents of two collections.
// Returns all elements from the first collection,
// then all elements from the second collection.
IEnumerable<TSource> Concat<TSource>(
this
IEnumerable<TSource> first,
IEnumerable<TSource> second);
Listing 1 demonstrates the simplest use of the Concat operator and the subtle difference between Concat and Union. The Console output from this example is shown in Output 1.
Listing 1. Simple example showing the difference between Concat and Union—see Output 1
int[] first = new int[] { 1, 2, 3 }; int[] second = new int[] { 3, 4, 5 };
// concat returns elements from both collections var q = first.Concat(second);
Console.WriteLine( "Concat example: 1,2,3 concatenated with 3,4,5 - ");
foreach (var item in q) Console.Write(item + " ");
// union returns the distinct concat of the collections var q1 = first.Union(second);
Console.WriteLine(); Console.WriteLine( "Union example: 1,2,3 unioned with 3,4,5 - ");
foreach (var item in q1) Console.Write(item + " ");
|
Output 1.
Concat example: 1,2,3 concatenated with 3,4,5 - 1 2 3 3 4 5
Union example: 1,2,3 unioned with 3,4,5 - 1 2 3 4 5
|
A useful application of the Concat
operator when binding a sequence to a control is its ability to add an
additional entry at the start or end as a placeholder. For example, to
make the first entry in a bound sequence the text “—none chosen—”, the
code in Listing 6-2 can be used, with the result shown in Figure 1.
Listing 2. Using the Concat operator to add values to a sequence—see Figure 1
// the actual list of status string[] status = new string[] { "Not Started", "Started", "Complete" };
// the desired first entry string[] prompt = new string[] { "-- none chosen --"};
ComboBox combo = new ComboBox();
// this is the where the two sequences get // combined and bound to the combobox combo.DataSource = prompt.Concat(status).ToList();
// display resulting combo in test form. Form form = new Form(); form.Controls.Add(combo); form.ShowDialog();
|
Distinct Operator
The Distinct operator removes duplicate elements from a sequence using either the default EqualityComparer or a supplied EqualityComparer. It
operates by iterating the source sequence and returning each element of
equal value once, effectively skipping duplicates. An ArgumentNullException is thrown if the source collection is null when this operator is called.
The method signatures available for the Distinct operator are:
// Returns unique elements from the source collection
// Uses EqualityComparer<TSource>.Default
// to compare elements for uniqueness.
IEnumerable<TSource> Distinct<TSource>(
this
IEnumerable<TSource> source);
// Returns unique elements from the source collection
// Uses the supplied comparer to compare
// elements for uniqueness.
IEnumerable<TSource> Distinct<TSource>(
this
IEnumerable<TSource> source,
IEqualityComparer<TSource> comparer);
Listing 3 demonstrates how to use the Distinct
operator to remove duplicate entries from a collection. This example
also demonstrates how to use the built-in string comparison types in
order to perform various cultural case-sensitive and insensitive
comparisons. The Console output from this example is shown in Output 2.
Listing 3. Example showing how to use the Distinct operator—this example also shows the various built-in string comparer statics—see Output 2
int[] source = new int[] { 1, 2, 3, 1, 2, 3, 4, 5 };
// Distinct de-duplicates a collection var q = source.Distinct();
Console.WriteLine( "Distinct example: 1, 2, 3, 1, 2, 3, 4, 5 - ");
foreach (var item in q) Console.Write(item + " ");
// distinct on string using comparer string[] names = new string[] { "one", "ONE", "One", "Two", "Two" };
/* built-in string comparer statics are helpful. * See the topic heading later in this chapter – * Custom EqualityComparers When Using LINQ Set Operators, * Built-in String Comparers */ var q1 = names.Distinct( StringComparer.CurrentCultureIgnoreCase);
Console.WriteLine(); Console.WriteLine( "Distinct example: one, ONE, One, Two, Two - ");
foreach (var item in q1) Console.Write(item + " ");
|
Output 2.
Distinct example: 1, 2, 3, 1, 2, 3, 4, 5 - 1 2 3 4 5
Distinct example: one, ONE, One, Two, Two - one Two
|
Except Operator
The Except
operator produces the set difference between two sequences. It will
only return elements in the first sequence that don’t appear in the
second sequence using either the default EqualityComparer or a supplied EqualityComparer.
It operates by first obtaining a distinct list of elements in the
second sequence and then iterating the first sequence and only returns
elements that do not appear in the second sequence’s distinct list. An ArgumentNullException is thrown if either collection is null when this operator is called.
The method signatures available for the Except operator are:
// Returns the elements from the source sequence
// that are NOT in the second collection using the
// EqualityComparer<TSource>.Default comparer to
// to compare elements.
IEnumerable<TSource> Except<TSource>(
this
IEnumerable<TSource> first,
IEnumerable<TSource> second);
// Returns the elements from the source sequence
// that are NOT in the second collection using the
// supplied comparer to compare elements.
IEnumerable<TSource> Except<TSource>(
this
IEnumerable<TSource> first,
IEnumerable<TSource> second,
IEqualityComparer<TSource> comparer);
Listing 4 shows the most basic example of using the Except operator. The Console output from this example is shown in Output 3.
Listing 4. The Except operator returns all elements in the first sequence, not in the second sequence—see Output 3
int[] first = new int[] { 1, 2, 3 }; int[] second = new int[] { 3, 4, 5 };
// Except returns all elements from the first // collection that are not in the second collection. var q = first.Except(second);
Console.WriteLine( "Except example: 1,2,3 Except with 3,4,5 - ");
foreach (var item in q) Console.Write(item + " ");
|
Output 3.
Except example: 1,2,3 Except with 3,4,5 - 1 2
|
Intersect Operator
The Intersect
operator produces a sequence of elements that appear in both
collections. It operates by skipping any element in the first
collection that cannot be found in the second collection using either
the default EqualityComparer or a supplied EqualityComparer. An ArgumentNullException is thrown if either collection is null when this operator is called.
The method signatures available for the Intersect operator are:
// Returns the elements from the source collection
// that ARE ALSO in the second collection using the
// EqualityComparer<TSource>.Default comparer to
// to compare elements.
IEnumerable<TSource> Intersect<TSource>(
this
IEnumerable<TSource> first,
IEnumerable<TSource> second);
// Returns the elements from the source collection
// that ARE ALSO in the second collection using the
// supplied comparer to compare elements.
IEnumerable<TSource> Intersect<TSource>(
this
IEnumerable<TSource> first,
IEnumerable<TSource> second,
IEqualityComparer<TSource> comparer);
Listing 5 shows the most basic use of the Intersect operator. The Console output from this example is shown in Output 4.
Listing 5. Intersect operator example—see Output 4
int[] first = new int[] { 1, 2, 3 }; int[] second = new int[] { 3, 4, 5 };
// intersect returns only elements from the first collection // collection that are ALSO in the second collection. var q = first.Intersect(second);
Console.WriteLine( "Intersect example: 1,2,3 Intersect with 3,4,5 - ");
foreach (var item in q) Console.Write(item + " ");
|
Output 4.
Intersect example: 1,2,3 Intersect with 3,4,5 - 3
|
Union Operator
The Union operator returns the distinct elements from both collections. The result is similar to the Concat operator, except the Union
operator will only return an equal element once, rather than the number
of times that element appears in both collections. Duplicate elements
are determined using either the default EqualityComparer or a supplied EqualityComparer. An ArgumentNullException is thrown if either collection is null when this operator is called.
The method signatures available for the Union operator are:
// Combines the contents of two collections.
// Returns all elements from the first collection,
// then all elements from the second collection.
// Duplicate elements are removed (only the first
// occurrence is returned).
// Uses EqualityComparer<TSource>.Default
// to compare elements for uniqueness.
IEnumerable<TSource> Union<TSource>(
this
IEnumerable<TSource> first,
IEnumerable<TSource> second);
// Combines the contents of two collections.
// Returns all elements from the first collection,
// then all elements from the second collection.
// Duplicate elements are removed (only the first
// occurrence is returned).
// Uses the supplied comparer to compare elements
// for uniqueness.
IEnumerable<TSource> Union<TSource>(
this
IEnumerable<TSource> first,
IEnumerable<TSource> second,
IEqualityComparer<TSource> comparer);
Listing 1 demonstrated the subtle difference between Union and Concat operators. Use the Union operator when you want each unique element only returned once (duplicates removed) and Concat when you want every element from both collection sequences.
Listing 6
demonstrates a useful technique of combining data from multiple source
types by unioning (or concatenating, excepting, intersecting, or
distincting for that matter) data from either a collection of Contact elements or CallLog
elements based on a user’s partial input. This feature is similar to
the incremental lookup features offered by many smart-phones, in which
the user inputs either a name or phone number, and a drop-down displays
recent numbers and searches the contacts held in storage for likely
candidates. This technique works because of how .NET manages equality
for anonymous types that are projected. The key to this technique
working as expected is to ensure that the projected names for each
field in the anonymous types are identical in name, case, and order. If
these conditions are satisfied, the anonymous types can be operated on
by any of the set-based operators.
Listing 6. Anonymous types with the same members can be unioned and concatenated—see Output 5
// lookup recent phone number OR contact first and last // names to incrementally build a convenient picklist on // partial user entry (narrow the list as data is typed). string userEntry = "Ka";
var q = (
// userEntry is contact name from contact in Contact.SampleData() where contact.FirstName.StartsWith(userEntry) || contact.LastName.StartsWith(userEntry) select new { Display = contact.FirstName + " " + contact.LastName }).Distinct()
.Union(
// userEntry is partial phone number (from call in CallLog.SampleData() where call.Number.Contains(userEntry) && call.Incoming == false select new { Display = call.Number }).Distinct()
);
Console.WriteLine( "User Entry - " + userEntry);
foreach (var item in q) Console.WriteLine(item.Display);
|
Output 5.
User Entry - Ka Stewart Kagel Mack Kamph
User Entry - 7 165 737 1656 546 607 5462 848 553 8487 278 918 2789
|
Custom EqualityComparers When Using LINQ Set Operators
LINQ’s set operators rely on instances of EqualityComparer<T>
to determine if two elements are equal. When no equality comparer is
specified, the default equality comparer is used for the element type
by calling the static property Default on the generic EqualityComparer type. For example, the following two statements are identical for the Distinct operator (and all of the set operators):
first.Distinct();
first.Distinct(EqualityComparer<string>.Default);
For
programming situations where more control is needed for assessing
equality, a custom comparer can be written, or one of the built-in
string comparisons can be used.
Built-in String Comparers
Listing 3
introduced an example that showed case-insensitive matching of strings
using the distinct operator. It simply passed in a static instance of a
built-in comparer type using the following code:
var q1 = names.Distinct(
StringComparer.CurrentCultureIgnoreCase);
In
addition to the string comparer used in this example, there are a
number of others that can be used for a particular circumstance. Table 1 lists the available built-in static properties that can be called on the StringComparer type to get an instance of that comparer.
Table 1. Built-in String Comparers
CurrentCulture | Case-sensitive string comparison using the word comparison rules of the current culture. Current culture is determined by the System.Threading.Thread. CurrentThread.CurrentCulture property. |
CurrentCultureIgnoreCase | Case-insensitive string comparison using the word comparison rules of the current culture. Current culture is determined by the System.Threading.Thread. CurrentThread.CurrentCulture property. |
InvariantCulture | Case-sensitive string comparison using the word comparison rules of the invariant culture. |
InvariantCultureIgnoreCase | Case-insensitive string comparison using the word comparison rules of the invariant culture. |
Ordinal | Case-sensitive ordinal string comparison. |
OrdinalIgnoreCase | Case-insensitive ordinal string comparison. |
In
addition to worrying about case-sensitive and case-insensitive string
comparison, it is important to consider the culture sensitivity of the
strings being compared and how the results of this comparison are
displayed to the user.
Microsoft has recommended guidelines documented on MSDN (see http://msdn.microsoft.com/en-us/library/kzwcbskc.aspx),
and these generally state that if the result of a sort is going to be
displayed to an end user, culture-sensitive sorting (the built-in
comparers starting with CurrentCulture) should be used (sorting items
in a listbox for example). Culture-insensitive comparison (the built-in
comparers starting with InvariantCulture) should be used when comparing
strings internally, and the result of string comparison should not
depend on the end user’s culture settings, for example when comparing
XML tokens in a file with those needed for processing in an application.
|
Building and Using a Custom EqualityComparer Type
Listing 7. Using a custom equality comparer with the Distinct operator
// find number of phonetic common names in list string[] names = new string[] { "Janet", "Janette", "Joanne", "Jo-anne", "Johanne", "Katy", "Katie", "Ralph", "Ralphe" };
var q = names.Distinct( new SoundexEqualityComparer());
Console.WriteLine("Number of unique phonetic names = {0}", q.Count());
|